Automated design of efficient fail-safe fault tolerance

نویسنده

  • Arshad Jhumka
چکیده

Both the scale and the reach of computer systems and embedded devices have been constantly increasing over the last decade. As such computer systems become pervasive, our reliance on such systems increases, resulting in our expectation for such systems to continuously deliver services, even in the presence of faults, that is we expect the computer systems to be dependable. One way to ensure the continuous delivery of dependable services is replication, which however, is expensive, so we focus on the cheaper alternative, that of software-based fault tolerance. There are different levels of fault tolerancethat can be provided, for example masking fault tolerance, fail-safe fault tolerance etc. In this thesis, we focus on providing fail-safe fault tolerance. Intuitively, a fail-safe faulttolerant program is one where it is acceptable for such a program to “halt” when faults occur, as long as it always remains in a “safe” state. Moreover, we endeavor to synthesize efficient fail-safe fault tolerance. We used two commonly-used criteria to assess the efficiency of a fail-safe fault-tolerant program, namely (i) error detection latency – or latency for short –, i.e., how fast can a fail-safe fault-tolerant program detect an erroneous state, and (ii) error detection coverage – or coverage for short, i.e., the ratio of “harmful” errors the program can detect. In this thesis, we present a formal framework for the design of efficient fail-safe fault-tolerant program. The framework is based on a refined theory of detectors, which introduces novel insights into their working principles. We introduce the concept of a perfect detector, which allows a fail-safe faulttolerant program to have perfect detection. This means that a program, composed with perfect detectors, have optimal detection coverage. Optimal in the sense that the detectors detect all of the “harmful” errors, and make no mistakes. Then, we present the concept of fast detection, and show how a failsafe fault-tolerant program can have both perfect, and fast error detection. In fact, the detection latency is shown to be minimal, i.e., the error is detected

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing Efficient Fail-Safe Multitolerant Systems

In this paper, we propose a method for designing efficient fail-safe multitolerant systems. A multitolerant system is one that is able to tolerate multiple types of faults, and a fail-safe multitolerant system handles the various fault types in a fail-safe manner. Efficiency issues of interest are fault tolerance-related, and they are: (i) completeness, and (ii) accuracy. Based on earlier work,...

متن کامل

A Fail-Safe CMOS Logic Gate

This paper reports a design technique to make Complex CMOS Gates fail-safe for a class of faults. Two classes of faults are denned. The failsafe design presented has limited fault-tolerance capability. Multiple faults are also covered.

متن کامل

A Framework for the Design and Validation of Efficient Fail-Safe Fault-Tolerant Programs

We present a framework that facilitates synthesis and validation of fail-safe fault-tolerant programs. Starting from a fault-intolerant program, with safety specification SS, that satisfies its specification in the absence of faults, we present an approach that automatically transforms it into a fail-safe fault-tolerant program, through the addition of a class of detectors termed as SS-globally...

متن کامل

Modeling Fault-tolerant Distributed Systems for Discrete Controller Synthesis

Embedded systems require safe design methods based on formal methods, as well as safe execution based on fault-tolerance techniques. We propose a safe design method for safe execution systems: it uses discrete controller synthesis (DCS) to generate a correct reconfiguring system. The properties enforced concern consistent execution, functionality fulfillment (whatever the faults, under some fai...

متن کامل

Byzantine Fault Tolerant Authentication

A Byzantine fault tolerant public key infrastructure is presented. It aims to fulfill the authentication requirements of large distributed systems consisting of semi-trusted parties. The distributed trust model does not demand the existence of predefined trusted parties and provides authentication if more than a threshold of the participants are honest. A voting based protocol implements distri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003